Skip to content

chore: Lagrange polynomial cache refactor#522

Open
eudelins-zama wants to merge 15 commits intomainfrom
eudelins/chore/2932/refacto-lagrange-store
Open

chore: Lagrange polynomial cache refactor#522
eudelins-zama wants to merge 15 commits intomainfrom
eudelins/chore/2932/refacto-lagrange-store

Conversation

@eudelins-zama
Copy link
Copy Markdown
Contributor

@eudelins-zama eudelins-zama commented Apr 13, 2026

Description of changes

This PR refactors Lagrange polynomial caching in threshold-algebra to avoid repeated interpolation setup work in hot paths.

The previous implementation memoized Lagrange polynomials lazily per field type behind RwLock<HashMap<...>>. This change replaces that with pre-computed OnceLock (or LazyLock for smaller fields) stores and routes interpolation through cached Lagrange bases when they are available, with a fallback to direct computation when they are not.

Note: For GF8 and GF16, the fields are small enough that we can compute all possible basis, so we don't need to know the number of parties and threshold beforehand.
For bigger field we have an init function, but TBD what we want to do long term (partly because if the n choose t grows too big we can't store all possible values)

  • kms-server initializes the stores from the configured threshold peer list and threshold
  • threshold-fhe initializes the stores from the loaded party topology on startup

This PR also:

  • adds a shared helper to build the Lagrange stores
  • adds lagrange_interpolation_with_polys so pre-computed bases can be reused directly
  • simplifies the experimental BGV field implementations to use fallback computation
  • updates the algebra benchmark to compare pre-computed interpolation vs recomputation

Bench results

Below trying to reconstruct 8 batches (in parallel) of a 1000 shares with 13 parties and threshold 4 for a varying number of shares (from 5 to 13) for degree 4 extension of Z128 (ring we use in prod).

CACHE:

batch_decode2t_mem/n=13_t=4_num_shares=5_batch=1000
                        time:   [132.62 ms 134.24 ms 136.18 ms]
batch_decode2t_mem/n=13_t=4_num_shares=6_batch=1000
                        time:   [149.30 ms 151.65 ms 154.32 ms]
batch_decode2t_mem/n=13_t=4_num_shares=7_batch=1000
                        time:   [164.97 ms 168.18 ms 172.32 ms]
batch_decode2t_mem/n=13_t=4_num_shares=8_batch=1000
                        time:   [180.99 ms 182.95 ms 184.97 ms]
batch_decode2t_mem/n=13_t=4_num_shares=9_batch=1000
                        time:   [209.41 ms 213.65 ms 218.50 ms]
batch_decode2t_mem/n=13_t=4_num_shares=10_batch=1000
                        time:   [230.05 ms 232.76 ms 235.42 ms]
batch_decode2t_mem/n=13_t=4_num_shares=11_batch=1000
                        time:   [247.90 ms 250.87 ms 253.65 ms]
batch_decode2t_mem/n=13_t=4_num_shares=12_batch=1000
                        time:   [266.12 ms 269.24 ms 272.06 ms]
batch_decode2t_mem/n=13_t=4_num_shares=13_batch=1000
                        time:   [270.86 ms 275.63 ms 279.82 ms]
NO CACHE:
batch_decode2t_mem/n=13_t=4_num_shares=5_batch=1000
                        time:   [288.73 ms 291.26 ms 293.90 ms]
batch_decode2t_mem/n=13_t=4_num_shares=6_batch=1000
                        time:   [395.35 ms 400.68 ms 405.56 ms]
batch_decode2t_mem/n=13_t=4_num_shares=7_batch=1000
                        time:   [511.98 ms 516.58 ms 521.70 ms]
batch_decode2t_mem/n=13_t=4_num_shares=8_batch=1000
                        time:   [596.16 ms 604.78 ms 616.80 ms]
batch_decode2t_mem/n=13_t=4_num_shares=9_batch=1000
                        time:   [727.02 ms 736.57 ms 747.43 ms]
batch_decode2t_mem/n=13_t=4_num_shares=10_batch=1000
                        time:   [851.84 ms 867.07 ms 886.28 ms]
batch_decode2t_mem/n=13_t=4_num_shares=11_batch=1000
                        time:   [934.60 ms 952.99 ms 973.79 ms]
batch_decode2t_mem/n=13_t=4_num_shares=12_batch=1000
                        time:   [1.0730 s 1.0798 s 1.0872 s]
batch_decode2t_mem/n=13_t=4_num_shares=13_batch=1000
                        time:   [1.2284 s 1.2377 s 1.2464 s]
RWLOCK Memoization (main):
batch_decode2t_mem/n=13_t=4_num_shares=5_batch=1000
                        time:   [378.52 ms 381.67 ms 384.68 ms]
batch_decode2t_mem/n=13_t=4_num_shares=6_batch=1000
                        time:   [358.76 ms 362.00 ms 366.44 ms]
batch_decode2t_mem/n=13_t=4_num_shares=7_batch=1000
                        time:   [350.76 ms 360.90 ms 370.62 ms]
batch_decode2t_mem/n=13_t=4_num_shares=8_batch=1000
                        time:   [340.84 ms 349.58 ms 361.53 ms]
batch_decode2t_mem/n=13_t=4_num_shares=9_batch=1000
                        time:   [337.64 ms 338.53 ms 339.37 ms]
batch_decode2t_mem/n=13_t=4_num_shares=10_batch=1000
                        time:   [339.58 ms 340.91 ms 342.46 ms]
batch_decode2t_mem/n=13_t=4_num_shares=11_batch=1000
                        time:   [337.62 ms 338.21 ms 338.80 ms]
batch_decode2t_mem/n=13_t=4_num_shares=12_batch=1000
                        time:   [335.43 ms 337.11 ms 338.92 ms]
batch_decode2t_mem/n=13_t=4_num_shares=13_batch=1000
                        time:   [352.13 ms 356.05 ms 360.76 ms]

Issue ticket number and link

Closes https://github.com/zama-ai/kms-internal/issues/2932

PR Checklist

I attest that all checked items are satisfied. Any deviation is clearly justified above.

  • Title follows conventional commits (e.g. chore: ...).
  • Tests added for every new pub item and test coverage has not decreased.
  • Public APIs and non-obvious logic documented; unfinished work marked as TODO(#issue).
  • unwrap/expect/panic only in tests or for invariant bugs (documented if present).
  • No dependency version changes OR (if changed) only minimal required fixes.
  • No architectural protocol changes OR linked spec PR/issue provided.
  • No breaking deployment config changes OR devops label + infra notified + infra-team reviewer assigned.
  • No breaking gRPC / serialized data changes OR commit marked with ! and affected teams notified.
  • No modifications to existing versionized structs OR backward compatibility tests updated.
  • No critical business logic / crypto changes OR ≥2 reviewers assigned.
  • No new sensitive data fields added OR Zeroize + ZeroizeOnDrop implemented.
  • No new public storage data OR data is verifiable (signature / digest).
  • No unsafe; if unavoidable: minimal, justified, documented, and test/fuzz covered.
  • Strongly typed boundaries: typed inputs validated at the edge; no untyped values or errors cross modules.
  • Self-review completed.

Dependency Update Questionnaire (only if deps changed or added)

Answer in the Cargo.toml next to the dependency (or here if updating):

  1. Ownership changes or suspicious concentration?
  2. Low popularity?
  3. Unusual version jump?
  4. Lacking documentation?
  5. Missing CI?
  6. No security / disclosure policy?
  7. Significant size increase?

More details and explanations for the checklist and dependency updates can be found in CONTRIBUTING.md

@cla-bot cla-bot Bot added the cla-signed The CLA has been signed. label Apr 13, 2026
@eudelins-zama eudelins-zama self-assigned this Apr 13, 2026

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 13, 2026

Consolidated Tests Results 2026-04-27 - 15:25:17

Test Results

passed 18 passed

Details

tests 18 tests
clock not captured
tool junit-to-ctrf
build build-and-test arrow-right test-reporter link #1706
pull-request chore: use once lock for lagrange store link #522

test-reporter: Run #1706

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
18 18 0 0 0 0 0 not captured

🎉 All tests passed!

Tests

View All Tests
Test Name Status Flaky Duration
nightly_full_gen_tests_k8s_default_threshld_sequential_crs 32.6s
test_k8s_threshld_insecure 1m 59s
k8s_test_crs_uniqueness 32.5s
k8s_test_insecure_keygen_encrypt_and_public_decrypt 2m 6s
k8s_test_insecure_keygen_encrypt_multiple_types 2m 21s
k8s_test_keygen_and_crs 1m 58s
k8s_test_keygen_uniqueness 5m 9s
nightly_full_gen_tests_k8s_default_threshld_sequential_crs 33.1s
test_k8s_threshld_insecure 1m 59s
k8s_test_crs_uniqueness 33.1s
k8s_test_insecure_keygen_encrypt_and_public_decrypt 2m 6s
k8s_test_insecure_keygen_encrypt_multiple_types 2m 21s
k8s_test_keygen_and_crs 1m 58s
k8s_test_keygen_uniqueness 5m 9s
nightly_full_gen_tests_k8s_default_centralzd_sequential_crs 1.9s
test_k8s_centralzd_insecure 1m 9s
k8s_test_centralized_insecure 1m 9s
nightly_full_gen_tests_default_k8s_centralized_sequential_crs 1.9s

🍂 No flaky tests in this run.

Github Test Reporter by CTRF 💚

🔄 This comment has been updated

@eudelins-zama eudelins-zama force-pushed the eudelins/chore/2932/refacto-lagrange-store branch from deca7ef to d0942dc Compare April 14, 2026 14:13
@eudelins-zama eudelins-zama marked this pull request as ready for review April 14, 2026 14:33
@eudelins-zama eudelins-zama requested review from a team as code owners April 14, 2026 14:33
dvdplm
dvdplm previously approved these changes Apr 15, 2026
Copy link
Copy Markdown
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here LGTM. I have one nag about the architecture which is not new to this PR.
We initialize one static for each Field impl that we have, but each LAGRANGE_STORE is defined outside the trait impl. We expect it to be there, but have no compiler guarantee that it is actually there. Furthermore, each LAGRANGE_STORE is actually parametrized on the number of parties and the threshold. It works here, because we tightly control the call sites (two in total), but again there is no compile time guarantee that the LAGRANGE_STORE I'm using is parametrized the way I expect. Code has to trust that the store was initialized somewhere in the proper way. It's up to us to know when/how to call init_all_lagrange_stores.

The third nitpick I have is about init_all_lagrange_stores: is the algebra crate the right place for this? One could make the argument that initializing a cache that depends on protocol parameters (parties&threshold) should happen in (or close to) the protocol. I am not sure what my own opinion actually is on this, just thought I'd mention it.

I'm approving this because it is a clear improvement on the old code, but yeah, there's something about a runtime-parametrized cache held in a "free" static that is awkward.

Comment thread core/threshold-bgv/src/algebra/levels.rs
Comment thread core/threshold-algebra/src/structure_traits.rs
Comment thread core/threshold-algebra/src/poly.rs
Comment thread core/threshold-algebra/src/poly.rs
Comment thread core/threshold-algebra/src/structure_traits.rs
Comment thread core/threshold-algebra/src/structure_traits.rs
@titouantanguy
Copy link
Copy Markdown
Contributor

titouantanguy commented Apr 16, 2026

Furthermore, each LAGRANGE_STORE is actually parametrized on the number of parties and the threshold. It works here, because we tightly control the call sites (two in total), but again there is no compile time guarantee that the LAGRANGE_STORE I'm using is parametrized the way I expect. Code has to trust that the store was initialized somewhere in the proper way. It's up to us to know when/how to call init_all_lagrange_stores.

Agreed with that, one solution to this might be to only have this cache for the small enough fields, and for those field generate all the possible Lagrange basis.
Since memory complexity is $O(2^{fieldSize})$ this would basically work OK for $GF(2^3)$ and $GF(2^4)$.
And for those fields we can cache all possible basis, so this doesn't even has to be done at runtime (or at least it doesn't have to be parametrized anymore).

In any case, for the bigger fields doing the caching is practically infeasible due to memory complexity, so for those we have to do on-the-fly computation.

I guess the main drawback is that this assumes the complexity of the cache is maximal.
Ofc there are situations where this assumption falls short by a lot, e.g. one has to use $GF(2^5)$ with $16$ parties. In this corner case it'd be acceptable to compute the cache which would count ~ $2^{16}$ elements, but we can't compute all lagrange basis ~ $2^{32}$ elements.

Comment thread core/service/src/bin/kms-server.rs
@titouantanguy
Copy link
Copy Markdown
Contributor

By the way, I believe the EXCEPTIONAL_SET_STORE should also be modified in a similar way (will also add this to the issue 2932)

Comment thread core/threshold-algebra/src/galois_fields/gf8.rs Outdated
Comment thread core/threshold-algebra/src/galois_fields/gf16.rs Outdated
Comment thread core/service/src/bin/kms-server.rs Outdated
Comment thread core/threshold-algebra/src/galois_fields/gf8.rs Outdated
Comment thread core/threshold-algebra/src/galois_fields/gf16.rs Outdated
Comment thread core/threshold-algebra/src/galois_fields/gf16.rs
dvdplm
dvdplm previously approved these changes Apr 24, 2026
Copy link
Copy Markdown
Contributor

@dvdplm dvdplm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dvdplm dvdplm changed the title chore: use once lock for lagrange store chore: Lagrange polynomial cache refactor Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants